Bayes Decision Rules and Confidence Measures for Statistical Machine Translation
نویسندگان
چکیده
In this paper, we re-visit the foundations of the statistical approach to machine translation and study two forms of the Bayes decision rule: the common rule for minimizing the number of string errors and a novel rule for minimizing the number of symbol errors. The Bayes decision rule for minimizing the number of string errors is widely used, but its justification is rarely questioned. We study the relationship between the Bayes decision rule, the underlying error measure, and word confidence measures for machine translation. The derived confidence measures are tested on the output of a state-ofthe-art statistical machine translation system. Experimental comparison with existing confidence measures is presented on a translation task consisting of technical manuals.
منابع مشابه
Error Measures and Bayes Decision Rules Revisited with Applications to POS Tagging
Starting from first principles, we re-visit the statistical approach and study two forms of the Bayes decision rule: the common rule for minimizing the number of string errors and a novel rule for minimizing the number of symbols errors. The Bayes decision rule for minimizing the number of string errors is widely used, e.g. in speech recognition, POS tagging and machine translation, but its jus...
متن کاملFluency Constraints for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices
A novel and robust approach to improving statistical machine translation fluency is developed within a minimum Bayesrisk decoding framework. By segmenting translation lattices according to confidence measures over the maximum likelihood translation hypothesis we are able to focus on regions with potential translation errors. Hypothesis space constraints based on monolingual coverage are applied...
متن کاملTraining and Evaluating Error Minimization Decision Rules for Statistical Machine Translation
Decision rules that explicitly account for non-probabilistic evaluation metrics in machine translation typically require special training, often to estimate parameters in exponential models that govern the search space and the selection of candidate translations. While the traditional Maximum A Posteriori (MAP) decision rule can be optimized as a piecewise linear function in a greedy search of ...
متن کاملTraining and Evaluating Error Minimization Rules for Statistical Machine Translation
Decision rules that explicitly account for non-probabilistic evaluation metrics in machine translation typically require special training, often to estimate parameters in exponential models that govern the search space and the selection of candidate translations. While the traditional Maximum A Posteriori (MAP) decision rule can be optimized as a piecewise linear function in a greedy search of ...
متن کاملEstimation of Confidence Measures for Machine Translation
Confidence Estimation has been extensively used in Speech Recognition and now it is also being applied in Statistical Machine Translation. Its basic goal is to estimate a confidence measure for each word in a given hypothesis, in order to locate those words, if any, that are likely to be incorrectly recognised or translated. It can be seen as a two-class pattern recognition problem in which eac...
متن کامل